54 research outputs found

    Linear pattern matching on sparse suffix trees

    Get PDF
    Packing several characters into one computer word is a simple and natural way to compress the representation of a string and to speed up its processing. Exploiting this idea, we propose an index for a packed string, based on a {\em sparse suffix tree} \cite{KU-96} with appropriately defined suffix links. Assuming, under the standard unit-cost RAM model, that a word can store up to logσn\log_{\sigma}n characters (σ\sigma the alphabet size), our index takes O(n/logσn)O(n/\log_{\sigma}n) space, i.e. the same space as the packed string itself. The resulting pattern matching algorithm runs in time O(m+r2+rocc)O(m+r^2+r\cdot occ), where mm is the length of the pattern, rr is the actual number of characters stored in a word and occocc is the number of pattern occurrences

    On the number of Dejean words over alphabets of 5, 6, 7, 8, 9 and 10 letters

    Get PDF
    We give lower bounds on the growth rate of Dejean words, i.e. minimally repetitive words, over a k-letter alphabet, for k=5, 6, 7, 8, 9, 10. Put together with the known upper bounds, we estimate these growth rates with the precision of 0,005. As an consequence, we establish the exponential growth of the number of Dejean words over a k-letter alphabet, for k=5, 6, 7, 8, 9, 10.Comment: 13 page

    Finding approximate repetitions under Hamming distance

    Get PDF
    The problem of computing tandem repetitions with KK possible mismatches is studied. Two main definitions are considered, and for both of them an O(nKlogK+S)O(nK\log K+S) algorithm is proposed (SS the size of the output). This improves, in particular, the bound obtained in \citeLS93. Finally, other possible definions are briefly analyzed.

    On the sum of exponents of maximal repetitions in a word

    Get PDF
    Rapport interne.This paper continues the study presented in {KolpakovKucherovRI98}, where it was proved that the number of maximal repetitions in a word is linearly-bounded in the word length. Here we strengthen this result and prove that the sum of exponents of maximal repetitions is linearly-bounded too. Similarly to {KolpakovKucherovRI98}, we first estimate the sum of exponents of maximal repetitions in Fibonacci words. Then we prove that the sum of exponents of all maximal repetitions in general words is linearly-bounded. Finally, some algorithmic applications of this results are discussed

    On repetition-free binary words of minimal density

    Get PDF
    Colloque avec actes et comité de lecture.In \cite{KolpakovKucherovMFCS97}, a notion of minimal proportion (density) of one letter in nn-th power-free binary words has been introduced and some of its properties have been proved. In this paper, we proceed with this study and substantially extend some of these results. First, we introduce and analyse a general notion of minimal letter density for any infinite set of words which don't contain a specified set of ``prohibited'' subwords. We then prove that for nn-th power-free binary words, the density function is 1n+1n3+1n4+O(1n5)\frac{1}{n}+\frac{1}{n^3}+\frac{1}{n^4}+ {\cal O}(\frac{1}{n^5}) refining the estimate from \cite{KolpakovKucherovMFCS97}. Following \cite{KolpakovKucherovMFCS97}, we also consider a natural generalization of nn-th power-free words to xx-th power-free words for real argument xx. We prove that the minimal proportion of one letter in xx-th power-free binary words, considered as a function of xx, is discontinuous at all integer points n3n\geq 3. Finally, we give an estimate of the size of the jumps
    corecore